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SMOOTHING 3-D DATA FOR TORPEDO PATHS 



I. THE GENERAL PROBLEM 

A. Data 

Data in the form of ordered quadruplets (t., x , y , and z ) are 

111 i 

available from 3-D files on torpedo and target paths. The times t^ are suffic- 
iently accurate so that they can be assumed to be without errors. The spatial co- 
ordinates x^, y^, and z^, however, are not only subject to measurement errors, 
but also may contain erratic measurements or have measurements missing for some 
of the equally spaced time intervals. 

B. Desired Output 

Information to be extracted from this data can be obtained either as: 

(1) smoothed information as a function of time (parametric form), or 

(2) smoothed information at a particular sequence of times which can be 

specified. 

A comparison of computational requirements of the two procedures will involve the length 
of intervals used in smoothing and the number of times in the sequence of tim.es of 
interest. Both procedures involve the same smoothing techniques. 

The information to be extracted from the 3-D data includes: 

(1) smoothed position coordinates 

(a) as functions of time (i.e., x=f (t), v=f (t), z=t (t)) 

X y z 

(b) at specified times tj (i.e., x(tj), y(tj), z(tj)). 
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(3) velocity component estimates 



(a) as functions of time (i.e., V (t), V (t), V (t)) 

A y z 

(b) at specified times t. (i.e., V^(tj), ^^(tj), 

(4) relative torpedo and target geometry in vicinity of intercept. 

C. Data Sample 

The path of the torpedo involves maneuvers so that segments must be 
selected for applications of the smoothing technique. The lengths of the segments, and 
hence the number of possible data points, is open to selection. Curves to be used to fit 
the data will primarily be polynomials. Longer path segments wiU generally require higher 
order polynomials and be more difficult to fit with acceptably small residuals. On the 
other hand, short intervals contain fewer data points and can limit capability for reducing 
prediction errors— the trade-off must be resolved by considering potential paths, and 
measurement errors. Some indication wiU be presented in subsequent sections of this 
report where data for a specific torpedo path is analyzed. Initially, two sample sizes 
(n=ll and n=21) are considered. 

One of the questionable features for smaU sample sizes is possible further 
reduction by deletion of data points which appears inconsistent with the remaining data. 
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II. 



DATA SMOOTHING 



A. Methodology 

The data smoothing considered in this report is limited to the method of 
least squares. Other methods such as Kalman filtering would be appropriate for real time 
data smoothing where interest is centered on the next data point following the data used 
in the smoothing, but the current status of the method is not appropriate for 
postexperimental application where times within the data sample are of interest. 

The data smoothing techniques currently used at IVPS involve the least 
squares method with the following equations: 

(1) x(t) = a + bt (linear) 

(2) x(t) = a bt + ct^ (quadratic, parabolic) 

(3) x(t) = a + bln 't) (logarithmic). 

This report concentrates on the addition of higher order polynomials, in particular: 

(4) x(t) = a^ * a-^t + a 2 t+ (cubic) 

(5) x(t) = a + a. t + a«t^ a^t^ + a.t"^ (quartic). 

0 14 o ^ 

The linear least squares technique is described in Appendix A. The sum of 
squares of the residuals 



D = 




n 



Xj -1^tp 



provides a basis for selection of the particular equation to be used in fitting a particular 
set of data. The statistic 

= D/(n-k), 
e 
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where n is the number of points in the sample and k is the number of parameters in the 
equation, provides an estimate of the variance of measurement errors. 

B. Sequential Differences 

A preliminary screening of sample data by successive differences can serve 
a dual purpose: 

(1) indication of the order of the polynomial required to produce a 
reasonable fit, and 

(2) indication of isolated wild data points (outliers^. 

The first through fourth successive differences are presented in Table 1 when the actual 
relationship of x to t is linear and in Table 2 when the relationship is quadratic. A 
perturbation d is introduced in x^- 

There are several salient features of successive differences that should be 

noted: 



(1) Ignore, for the moment, the perturbation in x^. In Table 1, the first 
differences (the A ^-’s) consist of the velocity term a^ plus noise. If a^ is large with 
respect to the noise (the n^'s), these differences will all have the same sign. The second 
differences (the A <;>j's); however, involve onlv noise and their signs should be random. 
This change from consistent signs for the ^ ^{'s to random signs for the ^ is an 
indication that a linear relationship of x to t is appropriate. 



In passing, it should be noted that: 
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Table 2. Sequential Differences - Quadratic Case 
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and that A ^ is normally distributed, i.e,, 



A, - N(a,, ). 

18 



It should also be noted that if a^, is not large with respect to a, the signs 
of the A can still have the sign of a^ with the dominance of this sign depending upon 
the relative magnitudes of a^^ and o-. 

Next, consider the quadratic case (Table 2), The A 3 j's having random 
signs and the A ^j's are dominated by the sign of a^, and hence the quadratics are 
indicated as the appropriate polynomial. Note that the signs of the A ^-’s may also be 
the same for all i if a^ and a^ have the same sign. If a^^ and a -2 have opposite signs and 
is greater than ao then there can be a change in the sign of the Anj’s where a. * (i^ - 
( i -1) ) a^ changes sign. In the vicinity of this point the n.'s can become significant and 
produce some random sign terms. 

Higher order differences are required to deal with higher order polynomials. 
In general, random signs in (k-^1) st order differences and consistent signs in order 
differences indicate selection of a (kTl)st order polynomial to tit the data. 

(2) The perturbation d was included to provide an examination of the effect 
of an isolated outlier on successive differences. For Ulustrative purposes, it will be 
assumed that a successive difference greater than three times the standard deviation of 
the noise in that difference will be considered as an indication that a perturbation exists. 
The value o' =4 will also be used for illustrative purposes. 

Now, note the entries in the lower part of Table 1. Unless a^ is known (or 
estimated) a critical magnitude for the A ^-'s cannot be specified. For higher order 
(jiffgf QficQs the i^^ difference of the order ( A jj) has a normal^istribution. 



Aji-N V'Mj’ 
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• Where kjj is the coefficient of d in A jj- If d = O then: 

A ji - N(0, .2 ,, 

j 

The situation is an application of statistical hypothesis testing. If A jj is larger than can 
be expected due to noise alone, then the presence of a perturbation (an outlier) is 
indicated. The critical magnitude using assumptions of l.Q-0.99 = 0.01 as significance 
level and =4 is presented in the last row of Table 1. Thus if 1 A 1 ^ 17, | A 1 

> 18, or I A I > 17, for any i, then an outlier is indicated. 

Note that the value ^ =4 was assumed for this illustration. If sequential 
differences are used for preliminary screening before least squares curve fitting is 
performed, the estimate for v/ili not be available. A value of cr may be assumed 
from prior information of measurement errors but for purposes of preliminary screening 
some value greater than 4 would permit elimination of data points with large 
perturbations. 



It should be emphasized that the above discussion pertains to the simplest 
situations. For applications where there are missing data points, or where perturbations 
are not isolated, more guidance will be required. The assumption that the noise 
components (the n^'s) are independent and have the same variance, also warrants 
reservations in applications of the models. 



III. 



APPLICATION 



A. Sample Data 

A specific test in which a torpedo was launched against a submarine at the 
Naval Undersea Warfare Engineering facilities will be used for illustration. The 3-D data 
includes equally spaced times from 814 to 1000— very few data ooints are missing. 
Figure 1 shows the torpedo path with every fifth point. Segments of this torpedo path are 
selected for application of the methodology presented in Section II. The presentation is 
restricted to the x and y coordinates. 

B. Data Sample I 

The initial 21 points (814-334) appear to lie in a straight line in Figure 1 and 
were selected as the first data sample. This data is presented in Figure 2 and Table 3. 

(1) Successive differences: 

The first and second order successive differences are also presented in 
Table 3. For the x component, all the first differences are negative and the second 
differences appear random (except possibly for the tail of the sample where a sequence of 
four pluses occur including one value (A ^ ,., = 17.2) which is large enough so that it might 
indicate an outlier). The alternating signs, (-, - or are not present so an isolated 

outlier does not appear likely. 

For the y component, all the first order successive differences are positive 
and the second order differences appear somewhat random. Again, A ^ - -13,2 

indicates that something has occurred in the vicinity of t^g. Higher order differences 
were not explored for this sample, 

(2) Least squares smoothing: 

Both linear and quadratic functions were fitted using the least squares 
method outlined in Appendix A, The results are presented below: 
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Table 3. Successive Differences — Sample I 



X. 

1 


Aii 


^2i 


^i 


^ti 




5228.6 


-71.8 




-3465.1 


+58.1 




5156.8 


-71.7 


+0.1 


-3407 . 0 


+60.8 


+2.7 


5085.1 


-68.8 


+2.9 


-3346.2 


+61 . 1 


+0.3 


5018.3 


-74.1 


-5.3 


-3285.1 


+62.9 


+1.8 


4944.2 


-66 . 1 


+8.0 


-3222.2 


+56 . 6 


-6.3 


4878.1 


-78.1 


-12.0 


-3165.5 


+59.8 


+3.2 


4800.00 


-68.3 


+9.3 


-3105.8 


+56.1 


-3.7 


4731.7 


-79.5 


-11.2 


-3049.7 


+52.5 


+6.4 


4652.2 


-68.6 


+9.9 


-2987.2 


+56 . 4 


-6.1 


4583.6 


-72.9 


-4.3 


-2930.8 


+60.2 


+3.9 


4510.7 


-70.5 


+2.4 


-2870.5 


+59.7 


-0.6 


4440.2 


-73.2 


-2.7 


-2810.8 


-^60.8 


+1.1 


4367.0 


-70.0 


+3.2 


-2750.0 


+60.0 


-0.8 


4297.0 


-70.9 


-0.9 


-2690.0 


+63.3 


+3.3 


4226.1 


-72.5 


-1.6 


-2626.7 


+55.1 


-8.2 


4153.6 


-69.6 


+2.9 


-2571.6 


+69.0 


+4.9 


4084.0 


-68.3 


+3.3 


-2511.6 


+62 . 5 


+2.5 


4017.7 


-49.1 


+17.2 


-2449.1 


+44.3 


-18.2 


3968.6 


-44.0 


+5 . 1 


-2404.3 


+47.7 


+3.4 


3924.6 


-56 . 6 


-12.6 


-2357.1 


+48.0 


+0.3 


3868.0 






-2309.1 
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Linear 



x(t) = 5288.3 -69.78t S = 16.73 

xe 

y(t) = -3518.1 + 58.721 3 = 8.33 

ye 

Quadratic 

x(t) = 5318.6 -77.671 + 0.35881^ = 11.62 

y(l) = -3532.0 + 62.331 - 0.16421' S = 6.30 

ye 

The residual deviations: 

=x,-x(tj) 

Syi = y-, - 

are shown in Figure 3. Note that there is a definite trend in these residuals starting about 
time t^g. Note also the general trend of the residuals with a small random pattern 
superimposed on a curve for each residual set. Higher order polynomials could he used to 
remove the general curve (this was not explored). Note, further, that no violent outliers 
are indicated. The fitted linear function is shown in Figure 2 and the observed and 
predicted values for Xj and yj are presented in Tables 4a and 4b together with the residuals 
in these components and the deviation 





+ 



e 



2 

yi 



The sequences of signs observed in Table 4a for the e -’s and e^7s are of 
interest. There is a sequence of +'s, followed by a sequence of -'s, and ending with a 
sequence of +'s for the e j's. Similarly, there is a sequence of -’s, followed by a seouence 
of +'s, and ending with a sequence of -'s for the e^-'s. (The sign of e^g can be ignored or 
changed since the magnitude of e^g is small.) 
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Figure 3, Least square residuals —sample I, 
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Table 4a. Linear Regression - Sample I 



^i 


x(tj) 


®xi 


'^i 


y(tj) 


^yi 


<^i 


5228.6 


5218.5 


+10.1 


-3465.1 


-3459.4 


-5 . 7 


11.6 


5156.8 


5148.8 


-8.0 


-3407.0 


-3400.7 


-6.3 


10.2 


5085.1 


5079.0 


+6.1 


-3346.2 


-3342.0 


-4.2 


7.4 


5018.3 


5009.2 


-9.1 


-3285.1 


-3283.2 


-1.9 


9.3 


4944.2 


4939.4 


+4.8 


-3222.2 


-3224.5 


-2.3 


5.3 


4873.1 


4869.7 


-8.4 


-3165.6 


-3165.3 


-0.2 


8.4 


4800.0 


4799.9 


+0.1 


-3105.8 


-3107.1 


-1.3 


1.3 


4371.7 


4730.1 


+1.6 


-3049.7 


-3043.4 


-1.3 


2.1 


4652.2 


4660.3 


-8.1 


-2987.2 


-2989.6 


-2.4 


8.5 


4583.6 


4590.6 


-7.0 


-2930.8 


-2930.9 


-0.1 


7.0 


4510.7 


4520.8 


-10.1 


-2870.5 


-2872 . 2 


-1.7 


10.2 


4440 . 2 


4451.0 


-10.8 


-2810.5 


-2313.5 


+3.0 


11.2 


4367.0 


4381.2 


-14.2 


-2750.0 


-2754.8 


-4.3 


15.0 


4297.0 


4311.4 


-14.4 


-2690.3 


-2696.0 


-5.7 


15.5 


4226.1 


4241.7 


-15.6 


-2626.7 


-2637.3 


-9.6 


18.3 


4153 . 6 


4171.9 


-18.3 


-2571.6 


-2578.6 


-7.0 


19.6 


4084.0 


4102.1 


-18.1 


-2511.6 


-2519.9 


-8.3 


19.9 


4017.7 


4032.3 


-14.6 


-2449.1 


-2461.2 


+12.1 


19.0 


3968.6 


3962.5 


-5.1 


-2404.8 


-2402.4 


-2.4 


6.6 


3924.6 


3892.8 


+31.8 


-2357.1 


-2343.7 


-13.4 


34 . 5 


3868.0 


3323.0 


-45.0 


-2309.1 


-2285.0 


-24.1 


51.1 
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Table 4b. Quadratic Regression - Sample I 



^i 




®xi 


^i 


'y(tj) 


®yi 


<^i 


5228.6 


5241.3 


-12.7 


-3465.1 


-3469.8 


+4.7 


13.5 


5156.8 


5164.7 


-7.9 


-3467.0 


-3408.0 


+1.0 


8.0 


5085.1 


5088.8 


-3.7 


-3346.2 


-3346.4 


+0.2 


3.7 


5018.3 


5013.6 


+4.7 


-3285.1 


-3285.3 


+0.2 


4.7 


4944.2 


4939.2 


+5 . 0 


-3222.2 


-3224.4 


+2.2 


5.5 


4878.1 


4865 . 5 


+12.6 


-3165.6 


-3153.9 


-1.7 


12.7 


4800.0 


4792.5 


+7.5 


-3105.8 


-3103.7 


-2.1 


7.8 


4731.7 


4720.2 


+11.5 


-3049.7 


-3043.8 


-5.8 


12.9 


4652.2 


4648.6 


+3.6 


-2987.2 


-2984.3 


-2.9 


4.6 


4583.6 


4577.8 


-^5.8 


-2930.8 


-2925.1 


-5 .7 


8.1 


4510.7 


4507 .6 


+3.1 


-3870.5 


-2866.2 


-4.3 


5.3 


4440 . 2 


4438.2 


+2.0 


-2310.8 


-2807.6 


-3.2 


3.8 


4367.0 


4369.5 


-2.5 


-2750.0 


-2749.4 


-0.6 


2.6 


4297.0 


4301.5 


-4.5 


-2690.0 


-2691.5 


+1 . 5 


4.7 


4226.1 


4234.2 


-8.1 


-2626.7 


-2633.9 


+7.2 


10.8 


4153.6 


4167.7 


-14.1 


-2571.6 


-2576.7 


+5.1 


15.0 


4084.0 


4101.9 


-17.9 


-2511.6 


-2519.7 


+8.1 


19.7 


4017.7 


4036.7 


-19.0 


-2449.1 


-2463.2 


+14.1 


23.7 


3968.6 


3972.3 


-3.7 


-2404.8 


-2406.9 


+2.1 


4.3 


3924.6 


3908.7 


+15.9 


-2357.1 


-2351.0 


-6.1 


17.0 


3868.0 


3845.7 


+22.3 


-2309.1 


-2295.4 


-13.7 


26.2 
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These sign sequences would ordinarily indicate that the next higher order 
polynomial, a quadratic, should do well in reducing the residual errors. This is not 
substantiated; however, as Table 4b demonstrates. The deviations in this table have four 
sequences of the same sign and suggest that even a cubic polynomial will not necessarily 
produce an excellent fit to the data — this was not explored further. 

An alternative to using higher order polynomials is the reduction in sample 
size. This alternative was explored for the sample with n=ll. The results are shown 
below; 





Linear 


Quadratic 


Sample Points 


xe 


s 

ve 


S S 

xe ve 


814-824 


3.3 


2.0 


— — 


819-829 


2.9 


1.9 


2.1 1.8 


824-834 


16.4 


9.5 


— — 


829-339 


13.9 


11.1 


— — 



The three basic causes for residuals are; 

(a) maneuver of object tracked (this is represented by the polynomial), 

(b) noise in measurements, (this is represented by <y of which is 

an estimate), and 

(c) outliers (these will be discussed later in this report). 

It is assumed that there are no outliers in Sample I. Subsample 2 (points 81S 
to 829) appears to be fitted quite well by a straight line and the quadratic was applied to 
give an estimate of the size of f . The first subsamples (points 814 to 824) are fitted 
reasonably well by a straight line so the quadratic was not tried. The last two subsamples 
have substantially larger S^'s. This could be caused by either torpedo maneuvers or a 
larger noise component (larger o- )— this was not explored. 
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C. Data Sample II 

The second sample selected for study was the set with times 867 to 887. 
These 21 points appear to present a curved path which might possibly be fitted by a 
quadratic. First, consider the successive differences in Table 5. Some difficulty similar 
to an outlier is indicated in the vicinity of tj = 6 (tj = 872). Examination of the first 
successive differences shows a drop in velocity between t^ and tg and only partial 
recovery between t 0 and t,^. One possible explanation would be an additional data point 
between t^ and tg. The actual explanation is the inadvertent introduction of a 
measurement from a different array taken at time tg and entered as the meaurement at 
tg. Measurements at t^, and subsequent times, should be shifted to respective preceding 
times. 



Instead of fitting all of Sample II, eleven points (872-882) were selected 
somewhat aribitrarily for fitting by least squares— these are plotted in Figure 4. The 
second differences all have the same sign and the third differences are small and have 
apparently random sign. The least squares straight line fit is presented in Table 6 a and 
sketched in Figure 4. (Note the shift in the time scale). This was introduced to reduce 

the magnitudes of the numbers calculated in determining the fitted line and S . In dealing 

— 1 — 1 ^ 
with the quadratic, the means x = 2 Xj and y = 2 yj were also subtracted from 

each observation Xj and y^, respectively, for the same reason. Table 6 b presents the 

quadratic regression. The reduction in the Sg's is dramatic as would be expected from 

Figure 4. All of the e^'s are less than 5 and hence within the residual noise that could be 

expected with a <r of 2 or 3. The signs of the e^j's; however, show some indications of 

lack of randomness. For this reason, a third-degree polynomial was tried for the Xj's only. 

This produced the value = 0.946 with the maximum magnitude of any e^j being 1.2. 

The cubic fits the data very well indeed. 

D. Data Sample HI 

The third sample selected for study involved an S-shaped maneuver as 
indicated by the 21 points (848-868) shown in Figure 5. The x and y coordinates of these 
points are presented in Figure 6 where it is evident that first and second order polynomials 
will not provide acceptable fits to the data. A third-order polynomial appears possible for 
the y.'s and a fourth order for the x.'s. A subset of 11 points (851-881 or points 4-14 in 
Figure 6 and Table 7) will be used for illustration. 
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17 

18 

19 
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Table 5 . Successive Differences - Sample n 



Xj ^1 ^9 ^3 ^i ^1 ^2 ^3 



2012.0 


+ 18.0 


2030.0 


+ 26.1 


2056.1 


+ 34.9 


2091.0 


+ 43 . 2 


2134.2 


+ 8.1 


2142.3 


+ 40.9 


2133.2 


+ 58 . 6 


2241.3 


+ 63.7 


2305.5 


+ 71.6 


2377.1 


+ 74.5 


2451.6 


+ 32.2 


2533.8 


+ 85.3 


2619.6 


+ 38.1 


2707.7 


+ 91.9 


2799.6 


+ 92.0 


2891.6 


+ 95.7 


2987 .3 


- 95.9 


3083.2 


+ 94.3 


3177.5 


+ 98.7 


3276.2 


+ 93.3 


3370.0 





+ 3.1 


+ 0.7 


+ 8.3 


- 0.4 


- 8.3 


- 43.4 


- 35.1 


+ 70.9 


+ 32.3 


- 15.1 


+ 17.7 


- 12.6 


+5 . 1 


- 2.3 


* 7.9 


-5 . 0 


+ 2.9 


+ 4.8 


+ 7.7 


- 4.1 


+ 3.6 


- 1.3 


+ 2.3 


- 1.5 


+ 3.3 


- 3.7 


+ 0.1 


- 3.6 


- 3.7 


-3 . 5 


+ 0.2 


- 1.3 


- 1.6 


+ 5 . 9 


+ 4.3 


- 9.2 


- 4.9 





- 1255 . 5 


+ 94.2 


- 1161.3 


- 91.1 


- 1070.2 


+ 88.1 


- 982.1 


- 87.0 


- 895.1 


- 20.6 


- 915.7 


+ 99.7 


- 316.0 


+ 77.7 


- 738.3 


+ 68.5 


- 669.3 


+ 65.6 


- 604.2 


- 55.6 


- 548.6 


- 50.3 


- 497.3 


+ 43.2 


- 454 . 6 


- 34.3 


- 419.3 


+ 26.8 


- 393.0 


- 13.4 


- 374.6 


- 7.8 


- 366.3 


- 0.7 


- 366.1 


- 10.4 


- 376.5 


- 23.2 


- 399.7 


- 26.5 


- 426.2 





- 3.1 
- 3.0 
- 1.1 
- 107,6 
- 120.3 
- 22.0 
- 9.2 
- 2.9 
- 10.0 
- 4.3 
-7 . 6 
- 8.4 
- 8.0 
- 8.4 
- 10.6 
- 7.1 
- 11.1 
- 12.8 
- 3.3 



- 0.1 
+ 1.9 
- 106.5 
- 227 , 9 
- 142.3 
- 12.3 
~6 . 3 
- 7,1 
- 5,2 
- 2.8 
- 0.3 
- 0.4 
- 0.4 
-2 . 2 
- 3.5 
- 4.0 
- 1.7 
- 9.5 
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Table 6a. Linear Regression - 11 Points (372-382) 



t. 

1 






®xi 




?(tj) 






-5 


2183.2 


2148.5 


+34.7 


-816.0 


-762.0 


-54.0 


64.2 


-4 


2241.8 


2229.7 


+12.1 


-738.3 


-715.5 


-21.8 


24 . 9 


-3 


2305 .5 


2310.9 


-5.4 


-669.8 


-671.1 


+ 1.3 


5 . 6 


-2 


2377.1 


2392.1 


-15.0 


-604.2 


-625.7 


+21.5 


26.2 


-1 


2451.6 


2473.2 


-21.6 


-548.6 


-586.2 


■^31.6 


38.3 


0 


2533.8 


2554.4 


-20.6 


-497.3 


-534.3 


-37.0 


42.4 


1 


2619.6 


2635.6 


-15.0 


-454.6 


-489.4 


-34.8 


38.3 


2 


2707.7 


2716.3 


-9.1 


-419.8 


-443 . 9 


-24.1 


25.7 


3 


2799.6 


2798.0 


+1.6 


-393.0 


-398 . 5 


+5.5 


5 . 7 


4 


2891.6 


2879.2 


fl2.4 


-374.5 


-353.1 


-21.5 


24.3 


5 


2987.3 


2960.4 


^26.9 


-366.1 


-307.6 


-58 . 5 


64.4 




'^(t)= 2554.4+81.191 




t(t)=- 


534. 8+45. 43t 








S = 
xe 


20.33 




''ye 


= 36.41 
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Table 6b. Quadratic Regression - 11 Points (872-882) 



'i 


^i 


^tj) 


®xi 




"y(ti) 


®yi 


<^i 


-5 


2183.2 


2179.3 


+3.9 


-816.0 


-817.8 


+1.8 


4.3 


-4 


2241.8 


2241.8 


-0.2 


-738.3 


-738.9 


+0.6 


0.6 


-3 


2305.5 


2308.8 


-3.3 


-669.8 


-667.4 


-2.4 


4.1 


-2 


2377.1 


2379.7 


-2.6 


-604.2 


-603.3 


-0.9 


2.8 


-1 


2451.6 


2454.7 


-3.1 


-548.6 


-546.7 


-1.9 


3.6 


0 


2533.8 


2533.9 


-0.1 


-497.8 


-497.6 


-0.2 


0.2 


1 


2619.6 


2617.1 


+2.5 


-454.6 


-455.9 


+1.3 


00 


2 


2707.7 


2704.5 


+3.2 


-419.8 


-421.6 


*1.8 


3.7 


3 


2799.6 


2796.0 


+3.6 


-393.0 


-394.3 


+1.8 


4.0 


4 


2891.6 


2891.6 


0.0 


-374.6 


-375.5 


-0.8 


0.8 


5 


2987.3 


2991.3 


-4.0 


-366.1 


-363.5 


-2.6 


4.8 




^(t) = 2533.9+81. 19t+2.057t^ 




^(t) = -497. 6+45. 43t-3 


.724t' 



The results of fitting third-degree polynomials to these 11 points is 
presented in Table 8 and the fourth-degree polynomial in Table 9. The cubic equation fits 
the y component quite well, but even the quartic equation leaves something to be desired 
(smaller S^) for the x component. Higher order polynomials were not tried. The estimates 
Sg for <j obtained by fitting polynomials to the subsample of 11 points are presented 
below: 



Order of 






Polynomial 


X 


Y 


1 


66.8 


94.5 


2 


37.3 


42.6 


3 


34.0 


3.5 


4 


9.3 





Improvement in fitting the y component by increasing the order of the 
polynomial is quite dramatic but the improvement is considerably slower for the x 
component. The third-order polynomial could be considered acceptable for y but a fifth- 
order polynomial should be tried for x. The order of polynomial used does not have to be 
the same for both components. 

E. Discussion 

Only one in-water run was examined and, for it, only selected sections of the 
torpedo path were treated in any detail. Nevertheless some conclusions can be made 
about application of the Sequential Differences and Least Squares Regression techniques 
to 3-D data. 



(1) Sequential differences: 

(a) These differences provide some capability for locating isolated 
outlier points which differ substantially from the path of the object being tracked. This 
was illustrated in Sample II. The model shown in Tables 1 and 2 needs extension to higher 
order polynomial paths and multiple outliers. Also, the critical magnitudes for sequential 
differences (refer to Table 1) must be increased to allow for accelerations since the use of 
sequential differences will precede fitting a polynomial and hence the order of the fitted 
polynomial will not be known at the time. Thus sequential differences should be used only 
for a first screening for gross outliers. 
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Data sample III — points 343-368. 
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Table 7. Successive Differences - Sample III 



"i ^1 ^3 -^1 -^2 ^3 



2949.3 


-40.5 


2889.3 


-56.8 


2828.5 


-51 . 5 


2777.0 


-74.5 


2702.5 


-84.7 


2617.8 


-93.5 


2524.3 


-84.3 


2440.0 


-54.3 


2385.7 


-15.9 


2369.8 


+25.7 


2395.5 


+10.6 


2406.1 


-33.0 


2373.1 


-67.8 


2305.3 


-89.2 


2216.1 


-91.4 


2124.7 


-75.7 


2049.0 


-42.2 


2006.8 


-4.5 


2002.3 


+9.7 


2012.0 


+18.0 


2030.0 





-10.8 


+10.1 


-0.7 


-22.3 


-23.0 


+12.8 


-10.2 


+1.4 


-8.8 


+18.0 


+9.2 


+20.8 


+30.0 


+8.4 


+38.4 


"3.2 


+41.6 


-56.7 


-15.1 


-28.5 


-43.6 


"8.8 


-34.8 


+13.4 


-21.4 


+19.2 


-2.2 


+17.9 


+15 . 7 


+17.8 


+33 . 5 


+4.2 


+37.7 


-23.5 


^14.2 


-5.9 


+ 8.3 





-1364.0 


+74.4 


-1289.6 


+74.0 


-1215.6 


+58 . 5 


-1159.1 


+88.8 


-1070.3 


+37.5 


-1032.8 


"1.0 


-1031.8 


-44.0 


-1075.8 


-72.4 


-1148.2 


-91.7 


-1239.9 


-78.9 


-1328.8 


-91.4 


-1420.2 


-88.5 


-1508.7 


-64.4 


-1573.1 


-25.0 


-1598.1 


+16.8 


-1581.3 


+53.7 


-1527.6 


+83.6 


-1440.0 


+93.2 


-1350.8 


+95.3 


-1255.5 


+94.2 


-1161.3 





-0.4 


-17.9 


-17.5 


+49.8 


+32.3 


-83 . 6 


-51.3 


+14.8 


-36.5 


-8 . 5 


-45.0 


+16.6 


-23.4 


+9.1 


-19.3 


+32.1 


+12.8 


-25.3 


-12.5 


+15.4 


+2.9 


+21.2 


+24.1 


+15.3 


+39.4 


+2.4 


+41.3 


-4.9 


+36.9 


-7.0 


+29.9 


-20.3 


+9.6 


-7.5 


+2.1 


-3.2 


-1.1 
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Table 8. Cubic Regression - Sample III (11 points) 



'i 




"x(tj) 


®xi 


^i 






^i 


-5 


2777.0 


2804.8 


-27.8 


-1059.1 


-1159.0 


-0.1 


27.8 


-4 


2702.5 


2680.2 


+22.3 


-1070.3 


-1066.7 


-3.6 


22.6 


-3 


2617.8 


2383.9 


+33.9 


-1032.8 


-1028.6 


-4.2 


34.2 


-2 


2524.3 


2511.8 


+ 12.5 


-1031.8 


-1035 . 5 


+3.7 


13.0 


-1 


2440.0 


2459.6 


-19.6 


-1075.8 


-1078.3 


*2.5 


19.8 


0 


2385.7 


2423.3 


-37.6 


-1148.2 


-1147.7 


-0.5 


37.6 


1 


2369.8 


2398.6 


-28.8 


-1239.9 


-1234.7 


-5.2 


29.3 


2 


2395.5 


2381.3 


+14.2 


-1328.8 


-1330.0 


-1.2 


14.3 


3 


2406.1 


2637.6 


+ 38.5 


-1420.2 


-1424.5 


*4.3 


38.7 


4 


2373.1 


2252.8 


+ 20.3 


-1508.7 


-1509.1 


+0.4 


20.3 


5 


2305.3 


2333.1 


-27.8 


-1573.1 


-1574.5 


*1.4 


27.8 



^x( t)=2423. 3-29.8121+5. 827t^-. 6493081^ 
"y(t)=-1147.7-79.73t-8.761t^-1.5271t^ 
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Table 9. Quartic Regression - Sample III (11 points) 



'i 


^i 




Si 


-5 


2777.0 


2774.0 


+3.0 


-4 


2702.5 


2711.0 


-8 . 5 


-3 


2617.8 


2614.8 


+3 . 0 


-2 


2524.3 


2516.9 


+7.4 


-1 


2440.0 


2439.1 


+0.9 


0 


2385.7 


2392.5 


-6.8 


1 


2369.8 


2378.1 


-8.3 


2 


2395.5 


2386.6 


+8.9 


3 


2406.1 


2398.4 


+7.7 


4 


2373.2 


2383.7 


-10.6 


5 


2305.3 


2302.3 


+3 . 0 




1c(t)=2392.4-29.812t+16.533t^-.6943t^ 
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.428234f^ 






(b) Sequential differences also provide some indication of the order of 
polynomial that will be required. One indicator is the number of sign changes that occur 
on the successive differences of a particular order. If there are few sign changes, then a 
non-random effect is indicated and a higher order polynomial will be indicated. Thus, for 
example, in Sample II the 11-point data subset shows a long sequence of +'s for the A ,^j's, 
but no such sequence (indicating randomness) for the A Hence, a third order 

polynomial can be expected to provide some improvement over a second-order 
polynomial. This type of information may be difficult to incorporate into a data 
smoothing algorithm, but even some simple procedure can be of help in reducing the 
computational load. 

(2) Sample Size: 

(a) Although it is possible that a sample of 21 points could be fitted 
with acceptably small S^ in some instances (the quadratic was not tried on Sample II), it 
would appear that smaller samples (e.g., n=ll) will allow fitting the data with a 
reasonably low-order polynomial. The size n=ll is not sacrosanct but will leave some 
room for elimination of outliers and so seem.s to be a reasonable size. 

(3) Least squares smoothing: 

(a) By its nature, the estimate S^, for the standard deviation cr of 
the measurement noise, is monotone decreasing as the order of the polynomial increases. 
(An n-1 order polynomial should be able to fit n points exactly so that Sg would be zero.) 
The appropriate order polynomial is one which reduces to the level of the noise in the 
measurements. This may vary with the path and the array making the measurements. For 
the portions of the path examined, it is suspected that is less than o-^ since is 

generally smaller than for a given order polynomial. The decision to use a higher- 
order polynomial to fit a set of data depends upon the value of obtained for a given- 
order polynomial. If is small (3 or 4), then higher-order polynomials cannot be expected 
to give much improvement. The extent to which can be reduced will depend upon the 
component as 'well as the polynomial degree. 
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(4) Outliers: 



(a) In addition to rough screening for outliers by sequential 
differences, there is additional screening that can be performed using residual errors after 
a polynomial has been fitted to the data. Outliers contributed substantially to Sg and the 
two basic techniques of reducing are elimination of points with large residuals, or 
increasing the order of the polynomial. 

(b) Elimination of outliers using residuals after smoothing can be 
accomplished in two ways: 

(1) by confidence intervals— a residual greater in magnitude than 
some specified multiple (3 or larger) of can be considered to be a outliers, and 

(2) by variance reduction— the ratio of S^'s before and after 
removal of a point, or points, with substantial residuals can be used as a basis for the 
decision on whether to remove the points. For example, if (after)/Sg (before) ^ r, 
then the points should be removed (Grubbs' criteria). The value of r is in the range 0.0 to 
1.0 and could be changed depending upon the magnitude of S^. 

(5) Sampling rate: 

(a) The smoothing of 3-D data can be performed to provide either a 
parametric representation of path segments, or specific information such as position and 
velocity information, only at certain points on the path. These will be calUed "oarametric 
estimation" and "point estimation," respectively. 

(b) To illustrate parametric estimation, consider data collected at 
200 sequential observation times (e.g., 800 to 1,000 for the 3-D data used in this section). 
Samples of 11 points will be used. Sample S^ will consist of points 1 through 11, sample S,^ 
of points 10 through 20 and, in general, sample Sj of points from lO(j-l) to lOj. There will 
then be 20 samples on the path. Each sample of 11 points is to be fitted by a polynomial 
of appropriate degree and the parameters of the polynomial together with the value of S^ 
recorded for the path segment represented by that sample. Note that there will be two 
points of overlap between S^ and S 2 and one point of overlap thereafter. 
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(c) For point estimation, sequence of points must be provided. For 
data consisting of 200 points it may be considered that occasional monitoring is sufficient 
for points 0 to 50 and 100 to 150, but that behavior of the path from points 50 to 100 
should be monitored more often and behavior from points 150 to 200 should be followed 
closely. Then the following sequence of points could be considered reasonable: 



Points Midpoint 



j 


in Sj 




1 


5-15 


10 


2 


25-35 


30 


3 


45-55 


50 


4 


55-65 


60 


5 


55-75 


70 


6 


75-85 


80 


7 


35-95 


90 


8 


95-105 


100 


9 


115-125 


120 


10 


135-145 


140 


11 


145-155 


150 


12 


150-160 


155 


13 


155-165 


160 


14 


160-170 


165 


15 


165-175 


170 


16 


170-180 


175 


17 


175-185 


180 


18 


180-190 


185 


19 


185-195 


190 


20 


190-200 


195 
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(d) At each midpoint time tj, the position coordinate estimates, the 
velocity in these components, the resultant velocity, and S • can be recorded together 

^ J 

with additional information, such as acceleration components, if desired. Note that the 
sequence of 20 points suggested above has substantial overlap of samples in some cases 
and data gaps between samples in other cases. This was introduced intentionally since 
least squares smoothing produces better estimates (smaller confidence intervals) at the 
midpoint of the sample when the fitted curve is a straight line (refer to Appendix B). 

(e) Parametric estimation could also be modified to delete some 
samples (e.g., alternate samples from tj=100 to tj=150). It should require greater 
modification to achieve the quality of point estimation procedure at other than 
parametric sample midpoints when a straight line (first-order polynomial) is used. When 
higher order polynomials are required, the preference for the best estimate at midpoint of 
the sample is lost (refer to Appendix B). Making both techniques available provides some 
flexibility in data smoothing to accomodate potential customers. 
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IV. 



A DATA SMOOTHING ALGORITHM 



The following procedure is suggested for smoothing 3-D data: 

Step 1 ; Select appropriate sample size. (11 is suggested as being small enough to 

provide some capability of fitting path segments of maneuvering torpedoes without 
requiring high-order polynomials. Some leeway for dropping outliers is also provided.) 

Step 2 i Select parameter of point estimation. 

Step 3 ; Select sampling rate. (A standard rate such as described in Section III S4 

should be provided as a default rate for parameter estimation and the midpoints of these 
samples as a default rate for point estimation.) 

Step 4 : Adjust data for missing data points. (The principle applied here is 

minimization of the effect of the numbers on sequential differences. For a single missing 
datum, the average of the values at two adjacent times will minimize the second 

differences. In any case, data supplied in this step must be removed before least squares 
smoothing is applied.) 

Step 5 ; Calculate first, second, and third order sequential differences. 

st 

step 6 t Determine approximate polynomial order k. (The (k-^1) order sequential 

differences should contain noise only, and thus, have random signs. Sequences of 4, or 
more, differences with the same sign suggest the presence of a non-random component as 
does the occurrence of 4, or fewer, changes of sign. The presence of a non-random 
component is going to be awkward to identify. If the second differences are random, then 
k=l. If the second-order differences are non-random, but the third-order differences are 
random, then k=2. If the third-order differences are non-random then fourth-order 
differences should be calculated and examined for randomness. (This examination of 
sequential differences in increasing order should probably not be carried beyond the fifth- 
order.) 
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step 7 ; Screen successive differences for gross outliers. (This must follow 

determination of approximate degree of polynomial since it should be based on comparison 
of magnitude of deviation to noise only as indicated in Tables 1 and 2. The critical values 
suggested in those tables should be increased substantially. Some limit, possibly between 
50 and 100, should be selected keeping in mind that this is a first screening for gross 
outliers and a second screening wiU be made. Any outliers found in this step; however, 
will reduce computations in later steps. Remove any outliers found and the observations 
for the other space components at the same observation time.) 

Step 8 : Check for polynomial degree compatibility. (If the number of outliers 

removed (r) satisfies the inequality r + k ^ n-1, where k is the degree of polynomial found 
in Step 8 and n is the sample size after data points supplied in Step 4 are removed, then 
fitting a k*"*^ order polynomial will be inappropriate. For example, if r = 4 points are 
removed from a sample in which one data point has been created in Step 4, then a 
polynomial of degree 5 can be fitted to the data without any residual errors since there 
are 6 linear relationships of the 6 coefficients.) 

Step 9 : Fit a polynomial of degree k to the data. (The least squares procedure 

outlined in Appendix A is applicable. At this step only S need be determined and not the 

k8 

coefficients.) 

Step 10 ; Seek acceptable S^. (If Sj^^ is unacceptably large, repeat Step 9 with k 
replaced by k + 1. Repeat this step until either S^ is acceptable or a polynomial of 
degree 5 is fitted to the data.) 

Step 11 : Complete least squares polynomial fit. (The coefficients for the polynomial 

of degree found in Step 10 are now needed, and the residual errors.) 

Step 12 : Second screening for outliers. (One of the procedures discussed in Section III 

E3 should be applied to locate any outliers not found in Step 7. Remove the outliers). 

Step 13 ; Repeat Steps 9, 10, 11, and 12 until no more outliers are found. (The 
polynomial obtained will be used for smoothing sample data. Note that the alternative 
procedure of searching residuals for each polynomial degree to locate outliers may result 
in removing points which 6ire not actualiy outliers but legitimate observations for a higher 
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degree polynomial. On the other hand, the proposed method could use a higher order 
polynomial to fit outliers when a lower order polynomial should actually be used. There is 
a choice of the type of misfit that is acceptable.) 



Step 14 ; 



Record smoothed path. (For parametric form, if specified in Step 2, 



recorded data includes coefficients of fitted polynomial, S ■ and n- for each sample S. 

J J 



specified in Step 3. 
includes: 



For point estimation form, if specified in Step 2, recorded data 



time tj estimated coordinates s^. = x(t), y^ = y(0, and Sj = z(t.), velocity 

components, S and a for each point specified in Step 3. Additional path information 
® J J 

may also be specified; e.g., acceleration components.) 
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V. 



CONCLUSIONS AND RECOMMENDATIONS 



The procedure suggested in Section IV provides a reasonable approach for 
obtaining the information desired in parts (1), (2), and (3) of Section I B. No attempt has 
been made to provide the information in part (4). 

In instrumenting this procedure, several parameters must be provided: 

A. Sample Size (Step 1) 

A smaller sample size of n=7 has been suggested. This would permit fitting 
path segments contained maneuvers with lower order polynomials, but is subject to 
greater degredation by missing data points and removal of outliers. Experience on 
relative occurrence of such events in actual field data will be useful in selecting 
appropriate sample size. 

B. Choice of Parameter or Point Estimation (Step 2) and Sampling Rate (Step 3) 
The desires of the customers who will use the smoothed data is of primary 

concern here. 



C. Specifying Approximate Polynomial Order (Step 6) 

It will be difficult to specify a simple rule for determining that the k order 
sequential differences contain non-random components but the (k^l) order differences 
involve only random components. The Theory of Runs can be of some help here although a 
simpler rule is desirable— this needs further study. 

D. Rough Screening For Outliers (Step 7) 

A reasonable critical level for identifying outliers by sequential differences 
must be established. The occurrence of an isolated outier was considered in Section II 3. 
Other potential producers of large sequential differences such as paired outliers, violent 
changes in velocity, et cetera, should be examined for resultant effects. Identification of 
signatures for such effects will be useful in using sequential differences to identify 
outliers. 
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E. 



Polynomial Degree Limitations (Step 10). 



The limitation of 'polynomial degree to 5, or less, appears reasonable for 
samples of size 11. The possibility of decreasing this limit to 4 or increasing it to 6 or 
higher should be considered. This may require more experience with in-water run data. 
For smaller sample sizes, such as n=7, reduction of this limit to lower polynomial degree 
should be considered. 

F. Computing Smoothed Path (Step 11) 

The pivotal condensation method outlined in Appendix A can be simplified 
even further in certain cases which may occur frequently enough to take advantage of 
their commonality in the computer program. In particular, when the sample consists of 
n=ll data points at adjacent times, the shift of the time origin to the midpoint of the 
sample produces the following effects: 

(1) coefficients of the polynomial parameters are the same in the normal 
equations for all samples, 

(2) only the last column in the pivotal condensation format changes with 

sample, and 

(3) the other columns in the pivotal condensation format require only 
addition of a row and a column in each box when the next higher degree polynomial is 
considered. 

The above commonality is also clearly evident in the vector representation presented in 
Appendix A. The extent to which this commonality can be exploited depends primarily 
upon the rarity of missing data points and outliers. Indeed, depending upon requirements 
of the ultimate users, data smoothing could conceivably be restricted to onlv such 
samples. 



In summary, the data smoothing algorithm presented in Section IV appears 
reasonable, but there are several elements that must be specified before it can be 
implemented. Some of these can be improved by further research, others depend upon the 
quality of the data which can only be determined by experience with actual 3-D data. 
Finally, some of them can only be determined in consultation with the ultimate users of 
the smoothed data. 
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APPENDIX 



A 



LEAST SQUARES DATA SMOOTHING 



A-1 LINEAR LEAST SQUARES WITH ONE PREDICTOR 

Sample : 



(Xi y^) i=l,2, . . . ,n 
Assumptions ; 



i s linear , i . e . , 



A1 — 



A2 — 



A3 — 

observations /errors 



Actual relationship between X and Y 



y ( x) = a+3x 



Abscissas are without errors 



Xi=Xi 



Ordinates contain measurement or 



y . =y . + 6 . 
= obs 

Yi=Y (x^) 



= observational error 



Problems 



Fit a straight line to the data 
Engineer's Solution ; 

Y ( x) =a+bx 
ei=Yi-Y (x^) 

n z n , z 

D=E e • = r (y ■ -a-bx . ) 

I 1 i 1 1 

The coefficients a and b are selected to 
minimize D (the sum of squares of the deviations of the 
observed Y^ ' s from the fitted line) . Setting 

^=0 and ^=0 



3a 



3b 
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gives the two equations 



na+ ( Zx^) b=Zy^ 

(Zx^)a+(Zx^ )b=Zx^y^ 

Solving these equations yields the desired 
estimates a and b for the parameters « and 3, i.e., 

b = ,n (^x.yp - (^x.) (sy. ) 
n (Zx ) - (Zxj ) ^ 

a = (lYj) -b(sx.) 



n 



Computational Format: 



The following format uses pivotal 
condensation to produce a and b. It also yields D and hence 
the sample variance 






n 



without requiring calculation of the individual e^'s. 



n 




(^Yi) 


A;^= [n (ZXi^ ) - (Zx^) ^ ] /n 




(2x.M 




A^^=[n(Zx.y.)-(Zx.) (Zy.) ]/n 








Ayy = [n(Zy^^) -(Zy^) ^]/n 






^xy 


^xy'^yy 








Sg=D/(n-2) 






D 


^~^xy /^xx 


a 


b 




a=[ (Zy^) -b(Zx^) ]/n 



Statistician's Solution: 



Statisticians augment the Engineer's Solution 
by adding the following assumptions: 



A4 — The observational errors (the €j^'s) 
are realizations of independent random variables, E^'s, with 
zero means and common variance, i.e.. 










1 



for all i. 



A- 



A5 -- The observational errors are 
normally distributed random variables. This will be denoted 
by 

E. - N(0,a“) 

Now let y. denote the realization of the 
random variable Then 



Y . =y (X . ) +E . 

1 ^ i' 1 

and 

|(Y.)=y(x.) 

Further, the random variable Y^ can be expressed in the form 

Y. =A+BX . 

1 1 

where 

nSx.Y. - (Sx.) (SY.) 

L_i ! L- 

n (Zx^ ) - (Zx^) ^ 

(YYj)-B(Zx. ) 
n 



= '^xY and 
A 

XX 



Note that A and B are linear functions of the Y^'s and hence of 
the E^'s. It can now be shown that 



U^=£(A) =c 
Ub = §(B)=8 



and 



so that a and b are unbiased estimators for ct and 3'. The 
evaluation of the variances of Y (x) , A and B is simplified if 
the x^'s are shifted so that their mean is zero. Then, since 

Z Xj =0 

A^x=2Xj ^ 

Axy= DXiYi 
b =(Zx^y;) 

(rxj) 




n 
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This shift in the x-axis will be assumed in the development 
which follows. 

It can now be demonstrated that 



a^=aVn, 
a|=aV( 2Xi , 

Cov (A,B) =l[ (A-a) (B-/?) ] =0, 



a 



y(x) 



n (£x} 2 ) 






and 






2 

E 



) = 



The last relationship is very important since ^ is an 
unbiased estimator ofa^and is our only source of information 
on this parameter. 



The assumption of normality (A5) together with 
linearity of the other random variables in the ' s insures 
that they are also normally distributed, i.e., 



Y - N(y,a") , 

A ~ N (a, a^/n) , 

B ~ N(^,cV£Xi / and 

Y(x) ~ N[Y(x) , (i aM 

Z n 

The random variable (n-2) S has a Chi-Square distribution 

with n-2 degrees of freedom and the random variables. 



= / n~( A- g) 
Se 

= B -6 






and 






(Y(x) -y(x) ) 



SgA + 
^ n 






ZXi 



have a Student-T distribution with n-2 degrees of freedom. 



These distributions can then be used to 
establish confidence intervals for a , 8, and y (x) at any x. 
Thusj for example, with k from Student-T tables such that 



P(-k<T<+k) = .95 
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we have the following 95% confidence intervals 



_ kS 
(a- e , 

/H~ 



a+ kS ) 
/n 



for ct, 

(b-kSg / S X j ^ , b+kSg /zx. 2) 

for B, and 

(Y(x)-kSe 4+j^, V(x)+kSe 

for y (x) at any x. It should be stressed that the confidence 
interval for y (x) given above involves measurements about the 
mean of x (x=0) . The general form for this confidence 
interval is 



(y (X) -kSe 



/ -.2 
/i + (x-x) 

n Z(xi"x)^ 



y (X) +kSg 




( X- x) ^ 
(xj -x) 2 



It should be 
shortest for 



noted that the confidence interval for y (x) is 
x=x and increases as x deviates from this value. 



mathematical 



A sketch of the situation can 
elements involved. 



help clarify 



the 
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y = a+Sx = actual linear relationship 

y “ a+bx = fitted 

= observational error 
e^ = fitting error 
e(x) = prediction error at any x 



y 
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LINEAR LEAST SQUARES WITH TWO PREDICTORS 



Sample ; 



. / X . , y. ) i= ,2, . . . ,n 
11 21 1 

Assumptions : 

A1 — y (x) =ag+ajX^ +a^x^ 

A2 — 

A3 — y^=y(X|)+€^ 

Engineers' Solution : 



Xij =x^^ and x^. =x^. 



X . = (x, . , X- . ; 
1 li 2i 



Let 



y(x)=a +a X +a X 
■* 0 11 2 2 

Si =yj^-y(Xj_ ) 



D=Ee- ^ = 



E (y -a -a x . -a x . ) 
0 111 2 21 



Minimizing D 



^=0, -^=0, -^=0 



3a 



3a 



3 €i 



0 I 

produces the normal equations 

na ( Ex 1 . ) a ^ + ( Ex ^ j ) a ^ = ( Sy^^ ) 

(Ex^.)a^ + iZx^.^)a^ + ( i i y 2 i ) ^ 2 = 

(Ex^j)ao + (Ex^. ^.)a^ + (Ex^j^)a^ = (Zx^jy.) 
which can be solved for , a^, and a^ in terms of sample data. 
Solving (1) for a gives 

a g= [ { Ey^) - ( Ex ^ ) a^ - ( Ex 21 ) a J /n 
Substituting (1 ) in (2) and (3) gives 
A 11 ^ 1 + 

^12^1+^22^ 2 = ^2„ 



( 1 ) 

( 2 ) 

(3) 



( 1 ) 

( 2 ') 

(3') 
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Solving (2* ) for gives 

a^=(Ai^-a^A^ 

and substituting in (3' ) gives 




where the coefficients will be defined in the computational 
format which follows. Equations (3" ), (2" ) and (l' ) can be 
used to determine the values of a , , a,, and a, 

0 1 i. 

COMPUTATIONAL FORMAT 



n 


^^li 


Zx^i 


Zy. 




ZxS- 


ZXi 1 X2 


Z X2 2 Yj_ 






2X2 


ZX2 • y^_ 

Zy.^ 






^12 


^y 






A 2 2 










\y 






^22 










^y 


De 


^0 




^2 


Se' 



A. =[n(Zx. x^.)-(Zx .) (Zx. .)]/ 

B22=[Ai iA22~Ai2^]/Ai 1 

De=[B22B^y-B2^M/B22 

Sa‘= : ej^=D,/(n-3) 

^2 2y/®2 2 

^i = [A^y-a,A^ J/A^j 

3o = [ (Zyi) -a^ ( X 2 i)-a^ (Zx^ .) ]/n 



Staticians' Solution 

Assumptions A4 and A5 lead to the following 
random variables and their distr iubtions 

E = observational error in y at (x^,X 2 ) 

- N(0,aM 

Y(x^ x^) = y(x, Xj) +E 

~ N (y^xi^x^) / 

^2~®2y/® 2 2 

A , , 
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A = [A -A A ]/A 

1 1 y 212 11 



N «!, 



2 2 



J~A A -A^ 
11 11 12 



Ao = [ ( ) - Aj - A 2 Zx 2 i ] /n 

~ N(ao/a‘^/n) 



Also 



Cov (Ao,Ai)=Cov (Ao,A 2)=0 
Cov {^l,h^)= -A , V 

At I A 2 2 ~A ^ j 2 



Then for a predicted value Y(Xj,X^) at any point (x^,x.^) 
have 



we 



2 - 





A. , 


1 X , ^ - 2 / 


\ 


i 2 





lljl . 



A[ 1 82 2 



x,x,. 



+ 



1 ^2 2 






This together with 



U'^= E(y) =y (x^ ,x^ ) 
and the fact that 



?(Xi ,X2 ) > N(u-,a^) 

can be used to establish confidence intervals for y {x i ,x 

CAUTION : In deriving these formulas it was 

assumed that x^ = xa =0. For data in which this shift has not 
been made, the formulae must be adjusted. 



Quadratic Model 



The quadratic mode 
ag+aiX+a2^^ 

can be transformed into a linear model with two predictors 
by the transformation 



X 



I 



=x 



f 
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LINEAR LEAST SQUARES WITH THREE PREDICTORS 



Sample ; 

li 3 i i = l,2,...,n 
Assumptions : 

Al -- y (Xi , X 2 ^^3 ) ~*^o'''CtiXi^+a2X2''‘Ci3X3 
A2 — i = lf2,3 

A3 — = y (Xi ,x^Xj ) 



Computational Format 



Zx . 
1 1 


Zx . 
2 1 


^^3i 


2 y. 










^li^i 


u, v=l ,2,3 




Ex .2 
2 i 


ZX 2 j^X3 £ 


Sx 2 iyi 








Zx 3 i^ 


ZX3i i 












®uv“ ^^UV~^ lU^ iv^ 1 


All 


Al 2 


Ai3 


Al 






A2 2 


A 2 i 


A2 


C ^(BooB -B B )/B 22 
uv ^ UV 2 U ZS7^ ^ 






A 3 3 


A3 

y 


De" (C3 3 C^y-Cgy ) /C 3 3 




B2 2 


B 2 3 










B 3 3 


®>y 


S^=De/tn-4)= Z e. 








"yy 








^33 


*^ 3 y 

c 

yy 


A3=C 3 y/C 3 3 








De 


A2=(B^^-a3B^^)/B22 


ai 


32 


a 3 


Se' 





A- 10 



2 (x ,x ,x ) =a +a X +a X +a X 

12 3 0 11 22 33 

Statistics 






A -a A )/A 

13 2 12 11 




Zx . -a Zx . -a Zx . ) /n 
31 2 21 1 11 



y (x 



1 



x , X ) =A +A X +A X ^ A X 

2 3 0 1 1 2 2 3 3 



=Prediction Equation 



It can be seen that the A,. 's, and hence 
?(x ,x ,x ) are normally distributed. Determining their means 
and^ variances is quite mathematically involved and will be 
delayed until the vector solution is considered. 



Cubic Polynomial 

v(x)=a +a x+a x^+a x^ 

" 0 12 3 

Transformation 



_ 2 _ 3 

X-XfX-X / X - X 
1 2 3 

v(x)-ci X +01 X ■'■Qt X 

0 1 1 2 2 3 3 
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LINEAR LEAST SQUARES WITH k PREDICTORS 
Sample Data ; 



(X. . , 
' 11 ' 






i”lf ♦ ♦ • 



Linear Model ; 

y=a +a X +...+a^>^. 
oil K K 

Prediction ; 

y=a +a X +. . . +a. X 

oil X K 



Computational Format 




Zx^i 


Zx2i ... 


2\i 


syi 


^-li' 


Zxii ^2i” 


• ZXii x,^j_ 


2^ii Yi 




Zv ?■ 

• • • • • 


•^-^2i^ki 








Zx ^ 
^^ki 












A 

1 1 


A 

1 2 


^ik 






A 

2 2 


^k 


^2y 






\k 


^ky 








^y 




^2 • 2 2 


A 2 . 2k 


A 2 • 2 y 




A 2 . 3 3 


’ A 2 . 3W 


A 2 . 3 y 






A 2 • k k 


A 2 • y y 
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3*33 



3 • 3 U 



3 • 3 y 



A 



A 



A 



A’ 

3 • U y 



• U y 

A V . y y 



A A A A. 

0 l U“1 ^ 



D 

Se^ = Dg/Cn-k-l) 
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LINEAR LEAST SQUARES IN VECTOR FORM 



This will be presented for k predictors (x. 
Let the sample data be (x /^2i ^ witn 
i=l,...,n. This data can be presented as^a vector y*" and a 
matrix ^ where 




where x =1. The linear model then takes the form 
0 

y=y(Xj,..,x )=a^x= z a x 

j =0 



where ct denotes the row vector which is the transpose of a. > i.e. 

a ~(a /Ci f f a i) 

0 1 ’< 

The fitted equation are 



y=y(x , . . ,x ) =a x= z a-x. 
1 i=0 ^ 3 



where the a^'s are established to minimize 



D=2e . 



with 



^vi>=yi-4 



3=0 



In vector form, we have 
e = y-xa 
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so that 



D=e e 

The normal equations (to minimize D) are 

XT X a = xT y 

with the solutions 

a= (:■: x) ‘ x^ y 

Expressing the coefficients in terras of random 
variables, we have 

where 

= (Y , . . . , Y„) = (Y+E)^ =Y^ +E^ 

with 

E =(E ,,**^E ) 

1 n 

y = (y / . . . f yn ) = fxa) 

Y=y+E 

using 

§ ( E ) =0 
g(E E ) =I<T^ 

where I is the nxn identity matrix, we have 
g(Y)=Y 

g(YY )=I<r2 + yy 

and hence the covariance matrix for Y is 
Cov (Y,Y ) = |(YY ') -Jy’ =I<T^ 



A- 15 



Then 



E(A) = (x' x)-i x'l'(Y) 



= (x x)'^ X xa=d 



Thus a provides unbiased estimates for the elements of a. 

For the variances and covariances of 
coefficients we have 



and 



Cov (A, A ) = (x x) 

Finally, for Y at any x we have 

g(Y)=y 

ai=x^ Cov (S,^' ) X 
= X ( X x) xa 



the 
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A P E N D I X B 



SAMPLE LEAST SQUARES CALCULATION 



B-1. 



STRAIGHT LINE REGRESSION FOR SAMPLE II 



SUM 



i 




X.' 


t. 


X. 


A 

X. 


e . 




1 


1 


1 


1 


1 


XI 


1 


872 


2183.2 


- 5 


-371.2 


-405.9 


+ 34.7 


2 


873 


2241.8 


- 4 


- 312.6 


* 324.7 


+ 12.1 


3 


874 


2305.5 


-3 


- 248 . 9 


- 243.5 


- 5.4 


4 


875 


2377.1 


- 2 


- 177.3 


- 162.3 


-15.0 


5 


876 


2451.6 


- 1 


- 102.8 


- 81.2 


- 21.6 


6 


877 


2533.8 


0 


' 20.6 


0.0 


- 20.6 


7 


878 


2619.6 


1 


65.2 


81.2 


' 16.0 


8 


879 


2707.7 


2 


153.3 , 


162.4 


- 9.1 


9 


880 


2799.6 


3 


245.2 


243.6 


+ 1.6 


10 


881 


2891.6 


4 


337.2 


324.3 


-12.4 


11 


882 


2987.3 


5 


432.9 


406.0 


-26.9 








0 


0.4 


0.4 


0.0 



28098.8 



- 1 

x= nZx. =2554.44 



x.=x.-x 



n=ll 



Zt . =0 
Zt? =110 



Zx^=0 . 4 
Zt.x.=8,931.2 
Zx. ^=728,868.12 



A =110 
1 1 



A =8,931.2 

i X 

A =728,868.11 

XX 
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Aee=3719.62 



A^ = 0.04 A^ = 81.19 S^e=Aee/^^“2)=413.29 Sg = 20.33 

X(t)=a + a t=0. 04+81. 19t 

0 1 

B-2. QUADRATIC REGRESSION FOR SAMPLE U 



^ li'^i 


t .=t.2 
2l 1 




X. 

1 




A 


®xi 


~ 3 


25 




- 371.2 




-375.1 


+ 3.9 


- 4 


16 




- 312.6 




-312.4 


- 0.2 


- 3 


9 




- 248.9 




- 245.6 


- 3.3 


- 2 


4 




- 177.3 




— 174.7 


- 2.6 


- 1 


1 




- 102.8 




— 99.7 


- 3.1 


0 


0 




- 20.6 




- 20.5 


- 0.1 


1 


1 




65.2 




+ 62.7 


+ 2.5 


2 


4 




153.3 




+150.1 


t 3.2 


3 


9 




245.2 




+241.6 


+ 3.6 


4 


16 




337.2 




+337.1 


+ 0.1 


5 


25 




432.9 




+436.8 


- 3.9 


SUM 0 


110 




0.4 




0.3 


0.1 


1 — 1 
» — 1 
M 
C 




St 


. =0 
ii 


St 


. =110 
2l 


SX^=0 . 4 






St 


o 
• — 1 
I — 1 

II 

rj 

•H 


St 


.t .=0 
11 21 


St ^.X^=8,931.2 










St 


. 2=1958 
21 


St .X. =1769.2 
2l 1 

SX. 2=728,868.1 



B-2 



a =- 20 
X(t) = 



A =110 A =0 

l l 12 


A ^=8,931.2 

1 A 


A =858 
2 2 


A =1765.2 

2 X 

Ax^=723,863.11 


A =858 

2,2 2 


A -^=1765.2 

2 ^ ^ X 

A =3719.62 

2, XX 




Aee=87.999 


a =81.19 a =2.057 

1 2 


S^e=Agg/8=11.00 


53+81. 19t+2.057t^ 


Sg=3.32 
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CONFIDENCE INTERVALS FOR ESTIMATES 



In Appendix A, it was shown that the 
confidence intervals for x(t) at any time t had the form 

(X(t)-Cj (t)Sg,X+Cj (t)Sg) j=l,2 



where 



C^(t) = 



k 





C (t) 




— ^ t. 2-2 

A 1 1 A 2 • 2 2 ^ 



/ Al 2 ^ 

I A 1 1 A 2 . 2 2 



t. (t -t) + 
12 2 





(t 2 - 



are the appropriate terms for the linear and quadratic 
regression curves, respectively. For a 95% confidence level 
and n-2 or n-3 degrees of freedom for the Student-T 
distribution we find ki =1.833 and k 2=1.860. Introducing the 
numerical values determined in the preceding sections of this 
appendix, we find 

C_(t)=1.833 l/^ + 



c (t: 

2 



=1.860 






110 



(t2 -10) 2 
858 



The relationships of C^(t) and C^(t) and the increments S^ (t) 
and S 2 gC^(t) are shown below using S^^=20.33 and S^^=3.32.® 





t 


__ Cl (t) 


C9 (t) 


Sie Ct 




0 


.553 


.847 


11.24 


+ 


1 


.580 


.820 


11.79 




2 


.654 


.765 


13.30 


+ 


3 


.762 


.776 


15.49 


+ 


4 


.891 


.981 


18.11 




5 


1.034 


1.417 


21.02 



Cg ( t ) 

2.81 

2.72 

2.54 

2.58 

3.26 



4.70 



The confidence interval for x(t) is shortest at 
t=0 (the sample midpoint) for the linear regression. To find 
the value of t in the quadratic regression for which the 
confidence interval is shortest, consider 

^11 110 858 



now 



dz 

dt^ 



1 2(t^-10) 

110 858 



= 0 



220 t^=2200-858=1342 

t^=6.10 

t=2.47 

C (2.47) =0.7535 
2 
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